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ABSTRACT: 

Many  realistic  protein-engineering  design  problems 
extend  beyond  the  computational  limits  of  what  is 
considered  practical  when  applying  all-atom  molecular- 
dynamics  simulation  methods.  Lattice  models  provide 
computationally  robust  alternatives,  yet  most  are 
regarded  as  too  simplistic  to  accurately  capture  the  details 
of  complex  designs.  We  revisit  a  coarse-grained  lattice 
simulation  model  and  demonstrate  that  a  multiresolution 
modeling  approach  of  reconstructing  all-atom  structures 
from  lattice  chains  is  of  sufficient  accuracy  to  resolve  the 
comparability  of  sequence- structure  modifications  of  the 
ricin  A-chain  (RTA)  protein  fold.  For  a  modeled 
structure,  the  unfolding-folding  transition  temperature 
was  calculated  from  the  heat  capacity  using  either  the 
potential  energy  from  the  lattice  model  or  the  all-atom 
CHARMM19  force-field  plus  a  generalized  Born  solvent 
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approximation.  We  found,  that  despite  the  low-resolution 
modeling  of  conformational  states,  the  potential  energy 
functions  were  capable  of  detecting  the  relative  change  in 
the  thermodynamic  transition  temperature  that 
distinguishes  between  a  protein  design  and  the  native 
RTA  fold  in  excellent  accord  with  reported  experimental 
studies  of  thermal  denaturation.  A  discussion  is  provided 
of  different  sequences  fitted  to  the  RTA  fold  and  a  possible 
unfolding  model.  ©  2007  Wiley  Periodicals,  Inc.^ 
Biopolymers  89:  153-159,  2008. 
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INTRODUCTION 

challenge  in  the  application  of  in  silico  protein 
design  is  the  problem  of  detecting  the  optimal 
sequence  fitness  of  a  protein  fold.  In  a  simplified 
approach,  a  sequence  is  mounted  onto  a  three- 
dimensional  structure  and  a  sequence-structure 
compatibility  profile  is  calculated  as  amino  acid  residues  are 
positioned  or  deleted  along  the  polypeptide  chain.  Certain 
structural  regions  are  thermodynamically  “stable,”  and  hence 
have  high  compatibility  scores,  whereas  other  regions  are  less 
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favorable.  While  there  are  many  computational  methods  that 
apply  force  helds  or  statistical  potentials  to  assess  compara¬ 
bility  of  sequences  fitted  to  single  protein  chain  structures, 
they  suffer  from  the  lack  of  ability  to  treat  unfolding-folding 
transitions  that  truly  govern  the  thermodynamics  of  protein 
stability. 

The  most  direct  approach  to  evaluate  sequence  fitness  is  to 
calculate  calorimetric  observables.  Generalized-ensemble  simu¬ 
lations'”^  offer  efficient  methods  that  naturally  invoke  the  con¬ 
cept  of  energy  landscape.  The  connection  to  thermodynamics 
from  landscape  theory  is  calculation  of  expectation  values  from 
the  density  of  states.  All-atom  molecular-dynamics  simulations 
provide  the  most  rigorous  sampling  method  to  generate  con¬ 
formational  states  over  a  wide  range  of  temperatures.  While  sig¬ 
nificant  progress  has  been  made  on  massively  parallel  all-atom, 
explicit  solvent  molecular-dynamics  simulations  of  protein 
folding,*^®  reported  studies  of  computing  a  heat  capacity  or 
other  calorimetric  observables  have  been  limited  to  small  pro¬ 
tein  systems.'”’"  Alternatively,  lattice  and  off-lattice  simulation 
models'^”'"'  are  computationally  tractable  approaches,  yet  their 
general  applicability  remains  uncertain  beyond  “toy  problems” 
or  well-defined  folding  models. 

In  this  brief  article,  we  examine  a  multiresolution  model¬ 
ing  approach  based  on  a  coarse-grained  lattice  model  applied 
to  a  realistic  structure-based  protein  design.  Ricin  A-chain 
(RTA)  is  an  N-glycosidase  that  acts  as  a  “ribosome-inactivat¬ 
ing  protein”  (RIP)  to  remove  a  specific  adenine  base  of  the 
28S  ribosomal  RNA  (reviewed  in  Ref.  15).  An  immunogenic 
polypeptide  was  recently  developed  for  RTA  as  a  modified 
RIP  fold  by  dedifferentiation  of  the  molecule  into  a  nontoxic, 
compact  single-domain  scaffold  for  presentation  of  protective 
B-cell  epitopes.'”  Experimental  melting  temperatures  were 
reported  and  indicated  that  the  altered  protein  fold  showed 
greater  resistance  to  thermal  denaturation  than  the  native 
RTA  structure.'”’'^  Here,  we  estimate  the  unfolding-folding 
transition  temperature  ( Tf)  from  the  heat  capacity  and  com¬ 
pare  our  results  with  experiments.  We  show  that,  regardless 
of  the  low  resolution  of  conformational  states  generated  from 
the  lattice  model,  the  potential  energy  function  of  the 
reduced  protein  representation  and  the  scoring  of  recon¬ 
structed  all-atom  structures  from  lattice  chains  using  the 
CHARMM19  forcefield  with  a  generalized  Born  (GB) 
implicit  solvent  model'"  are  sufficiently  accurate  to  yield  the 
correct  relative  change  in  Tf.  We  discuss  these  results  and  pro¬ 
vide  an  assessment  of  lattice  simulations  to  model  sequence- 
structure  fitness  and  unfolding  of  the  RTA  protein  fold. 

METHODS 

Generation  of  chain  conformations  were  based  on  Monte  Carlo 
sampling  of  a  cubic  lattice  using  the  MONSSTER  program  devel¬ 


oped  by  Skonick  et  al.  and  Kolinski  and  Skolnick.'^’^”  MONSSTER 
implements  the  SICHO  (Side  Chedn  Only)  model  where  each  amino 
acid  in  a  protein  chain  is  represented  by  a  single  virtual  particle 
located  at  the  side-chain  center  of  mass  and  projected  onto  a  cubic 
lattice  with  1.45-A  grid  spacing.  The  SICHO  model  forcefield  con¬ 
sists  of  short-  and  long-range  potential  energy  interactions,  hydro- 
gen-bonding  cooperativity,  and  a  mean  force  potential  that 
describes  hydrophobic  interactions,  and  is  given  by 

C  fishort  -f  filong  T  bn-bond  bhydro,  (1) 

where  each  term  contains  sequence-independent,  sequence-de¬ 
pendent,  and  restraint  components.  The  sequence-specific  poten¬ 
tial  was  derived  through  geometric  statistics  of  known  protein 
structures  and  accounts  for  short-range  interactions  between 
nearest  neighbors  along  the  polypeptide  chain,  as  well  as  long- 
range,  pairwise,  soft-core  repulsive  interactions.  Sequence-inde¬ 
pendent  potential  terms  model  protein-like  conformational  biases 
of  the  chain. 

To  enhance  the  exploration  of  the  potential  energy  landscape, 
we  applied  replica-exchange  simulations'””  implemented  in  the 
Multiscale  Modeling  Tools  for  Structural  Biology  (MMTSB).^' 
The  wild-type  ricin  A-chain  structure  (denoted  as  wt-RTA)  was 
taken  from  the  PDB  using  IDilrtc  and  crystallographic  waters 
were  deleted.  The  starting  structure  for  the  RTA  immunogen  was 
built  by  removing  the  residues  in  the  loop  region  34-43  and  the 
C-terminal  region  of  residues  199-267.  We  designate  this  modeled 
structure  as  RTAl-33/44-198.'”  The  remaining  two  modeled 
structures  are  RTA  with  residues  located  in  the  C-terminal  region 
199-267  replaced  with  the  hydrophobic  residue  He  (the  protein 
chain  is  denoted  as  RTA-Ile),  and  the  second  are  the  same  resi¬ 
dues  replaced  with  the  sequence  corresponding  to  pokeweed  anti¬ 
virus  protein  (PAP),  using  a  structure-structure  alignment  with 
the  PDB  structure  Ipag.  We  denote  our  hybrid  as  RTA/PAP.  The 
three  built  structures  (RTAl-33/44-198,  RTA-Ile,  and  RTA/PAP) 
were  each  subjected  to  structural  refinement  by  applying  steepest 
descent  minimization  and  adopted-basis  Newton-Raphson  mini¬ 
mization  for  a  total  of  100  steps.  The  forcefield  was  set  with 
CHARMM19  and  solvent  effects  during  minimization  were 
treated  by  a  protein  distance-dependent  dielectric  screening  model 
of  £  =  r.  Nonbonded  interaction  cutoff  parameters  for  electro¬ 
statics  and  van  der  Waals  (vdW)  terms  were  set  at  a  radius  of 
22  A  with  a  2-A  potential  switching  function. 

The  grid  size  for  the  cubic  lattice  was  set  at  a  value  of  500  lattice 
units  in  each  direction.  Selection  of  the  lattice  size  was  to  ensure 
adequate  sampling  of  unfolded  conformations  without  restricting 
the  protein  radius  of  gyration.  The  number  of  lattice  simulation 
cycles  at  each  temperature  was  set  to  200  and  the  number  of  Monte 
Carlo  moves  per  cycle  was  200.  CuUed  conformations  from  the  lat¬ 
tice  simulations  consisted  per  replica  of  4000  structures  extracted 
from  a  sampled  population  of  1.6  X  10^  conformations.  The  initial 
20  million  conformations  per  replica  were  applied  for  equilibration 
and  not  used  in  the  analysis.  A  total  of  32  replicas  were  used  and 
exponentially  spaced  from  a  reduced  temperature  T  of  0.8  to  2.0, 
where  T  is  normalized  by  a  reference  temperature  such  that  /)”'  = 
k^T  represents  the  energy  unit  (where  fcg  is  the  Boltzmann  con¬ 
stant).  We  set  T  =  1  to  represent  the  distribution  of  conformations 
modeled  by  the  SICHO  forcefield  at  298  K.  All-atom  structures 
were  reconstructed  from  the  lattice  simulations  by  using  the 
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MMTSB  procedure  developed  by  Feig  et  The  lattice-based 
structures  were  refined  by  a  combined  application  of  steepest 
descent  minimization  and  adopted-basis  Newton-Raphson  minimi¬ 
zation  for  a  total  of  100  steps.  To  change  the  representation  from 
lattice  to  all-atom  model,  we  selected  the  CHARMM19  force- 
field  as  a  stepwise  increase  in  resolution  and  modeled  solvent 
effects  during  minimization  by  setting  e  =  r.  Nonbonded  inter¬ 
action  cutoff  parameters  were  set  at  values  stated  above.  The 
final  scoring  of  the  reconstructed  structures  was  based  on  a 
CHARMM19/GB  model.*® 

Thermodynamic  properties  were  calculated  by  using  the  temper¬ 
ature  weighted  histogram  analysis  method  (WHAM).^®“^®  Given 
two  parameters,  for  example,  the  potential  energy,  U,  and  the  radius 
of  gyration,  Rg,  the  density  of  states  is  given  by 


n(u,Rg) 


(2) 


where  rij  is  the  number  of  data  points  in  the  jth  simulation,  I  is 
the  number  of  replica-exchange  temperatures,  and  fj  is  the 
inverse  temperature.  The  function  Ni{U,R^)  is  the  histogram  of  U 
and  Rg  calculated  from  the  ith  simulation,  and  f  is  the  scaled 
free  energy  obtained  by  solving  the  following  equations  self-con- 
sistently, 


Pli{U,R,} 


EtiN;(l/,Rg)exp(-/l[/) 

E|=i  njeMfj-  ^jU) 


(3) 


and 


exp(-^)  =  ^n([/,Rg)exp(-/f[/),  (4) 

U,R^ 


where  PpiU,  Rg)  is  the  probability  density  at  the  inverse  temper¬ 
ature  p.  The  heat  capacity  at  constant  volume,  Cf,  can  be  deter¬ 
mined  from  the  probability  density  by  the  following  expres¬ 
sion^® 


a(T) 


4  _{u^)-{uf 
hT^  hT^ 


where 


(U") 


U'‘Pp{U,R,) 

j:u,R,Pi>iu,R,) 


(5) 


(6) 


with  u  =  1  or  2. 


RESULTS  AND  DISCUSSION 

Figure  1  shows  the  calculated  temperature-dependent  pro¬ 
files  of  the  heat  capacity  function  for  each  of  the  RTA  struc¬ 
tural  models  (wt-RTA,  RTAl-33/44-198,  RTA/PAP,  and  RTA- 
lle).  The  temperature  corresponding  to  the  maximum  in  the 
Cy  function  is  modeled  as  an  apparent  unfolding-folding 
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FIGURE  1  Temperature-dependent  profiles  of  the  heat  capacity 
Cy  calculated  from  the  lattice  simulations  of  proteins  wt-RTA  (blue 
colored  line),  RTAl-33/44-198  (red  line),  the  hybrid  RTA/PAP  (long 
dashed  line),  and  RTA-Ile  (dashed-dot-dashed  line).  Heat  capacities 
were  computed  from  a  WHAM  analysis  of  an  equilibrated  set  of 
128,000  lattice  chains  extracted  from  a  total  sampling  of  over  5  X 
10*”  chain  configurations.  A  comparison  is  provided  of  Q  com¬ 
puted  by  scoring  the  all-atom  structures  reconstructed  from  the  lat¬ 
tice  chains  for  wt-RTA  (black  dashed  line)  and  RTAl-33/44-198 
(black  solid  line).  Superimposed  on  the  graph  are  the  starting  all¬ 
atom  structures  for  the  native  wt-RTA  fold  (corresponding  to  the 
blue-colored  profile)  and  a  model  of  RTAl-33/44-198  (corre¬ 
sponding  to  the  red-colored  profile),  where  the  loop  region  34-43 
and  the  C-terminal  region  199-267  were  removed  by  protein  engi¬ 
neering.*®  The  bumps  in  the  all-atom  Q  are  an  artifact  of  the  high 
T  conversion. 


temperature  of  the  protein  chain.  We  first  consider  the 
wt-RTA  structure  and  the  polypeptide  immunogen  (RTAl- 
33/44-198).  For  each  model,  the  simulation  yields  a  broad 
heat  capacity  of  unfolding-folding  transitions  with  a  fitted 
maximum  value  at  T{  of  1.592  or  roughly  474  K  for  the  wild- 
type  structure  and  a  fitted  Tf  of  1.616  or  482  K  for  the  immu¬ 
nogen.  As  described  further  below,  the  broad  nature  of  Cy 
over  the  wide  range  of  temperatures  indicates  that  the  model 
for  unfolding-folding  is  nearly  continuous  and  lacks  signifi¬ 
cant  cooperativity,  leading  to  what  can  best  be  described  as  a 
polyamorphic  transition  between  a  highly- frustrated  “glassy” 
state  and  a  configurationally  labile  protein.^^’^®  While  the 
SICITO  model  may  not  have  realistically  reproduced  the 
high-resolution  calorimetric  features  of  the  RTA  molecules, 
the  difference  given  by  ATf  between  the  two  structures  is 
approximately  8  K  and  is  in  remarkable  concurrence  with  the 
experimental  determination  of  roughly  7  K.*®’*^  This  correct 
prediction  of  a  small  change  in  protein  stability  needs,  how¬ 
ever,  to  be  cautiously  weighted  by  the  ambiguity  in  determin¬ 
ing  Tf  from  a  broad  Cy  function.  Although  each  heat  capacity 
has  converged  from  Monte  Carlo  sampling  over  5  X  10*°  lat¬ 
tice  configurations  and  the  error  in  the  WHAM  numerical 
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solution  is  negligible,  the  statistical  uncertainty  approaches 
the  scale  of  A  Tf. 

To  help  locate  the  maximum  transition  in  the  Q  function, 
we  applied  a  multiresolution  modeling  approach  and  recon¬ 
structed  all-atom  structures  from  the  lattice  chains.  After 
nominal  energy  refinement,  the  all-atom  structures  were 
scored  using  the  CHARMM19/GB  model.  Each  heat  capacity 
was  recomputed  using  the  CHARMM19/GB  energies  accord¬ 
ing  to  Eq.  (5)  and  the  values  were  linearly  scaled  to  allow  for 
a  direct  comparison  with  the  calculations  using  lattice  ener¬ 
gies.  We  note  that,  while  the  distributions  were  analyzed  by 
WHAM,  the  Boltzmann  factors  containing  CHARMM19/GB 
energies  were  not  derived  from  a  detailed  balance.  Without 
resorting  to  a  complete  aU-atom  simulation,  the  most 
straightforward  approach  to  achieve  the  transformation  of 
statistical  weights  from  one  distribution  to  another  is  to 
apply  an  umbrella  reweighting  technique  to  the  probability 
density  distribution  generated  by  the  lattice  and  use  as  cor¬ 
rective  terms  the  all-atom  energies  of  the  structures  that  are 
allowed  to  readjust  on  their  minimalist  energy  landscape. 
Given  the  molecular  size  of  the  proteins  in  our  study  and 
the  128,000  conformations  extracted  from  the  replicas, 
applying  this  approach  and  achieving  numerical  convergence 
of  the  population  distribution  profile  would  be  a  challenging 
task. 

Figure  1  compares  the  temperature-dependent  profiles  for 
the  all-atom  models  and  reveals  an  unambiguous  determina¬ 
tion  of  ATf,  yielding  approximately  15  K.  As  before,  the  cal¬ 
culated  trend  is  in  excellent  accord  with  experiments.  The 
effect  of  changing  the  statistical  distributions  to  all-atom 
energies  mimics  a  highly-ordered  system  (crystalline  vs. 
glassy)  with  a  percolation  to  random  chains.  Interestingly, 
the  profiles  now  resemble  those  of  off-lattice  models  with 
very  sharp  transition  peaks  (see,  e.g..  Ref  14).  In  either  case, 
the  lattice-generated  conformations  demonstrate  that  the 
protein  design  of  truncating  primarily  the  G-terminal  region 
yields  a  protein  fold  with  greater  resistance  to  thermal  dena- 
turation. 

To  test  further  the  sensitivity  of  the  SICHO  potential 
energy  function  to  model  sequence  compatihility  of  the  C- 
terminal  region  on  unfolding  rather  than  merely  a  random 
generation  of  lattice  chains,  two  additional  lattice  simula¬ 
tions  were  carried  out.  The  first  is  a  structural  model  of  a 
hybrid  protein  RTA/PAP,  where  the  region  199-267  is 
replaced  with  the  sequence  from  pokeweed  antiviral  protein, 
which  contains  greater  percentage  of  hydrophilic  residues 
than  RTA  (30%  vs.  15%).  PAP  is  a  single-chain  RIP  that  lacks 
a  B-chain  for  cell-binding  induced  translocation  and  it  cata- 
lytically  cleaves  the  same  specific  adenine  base  from  28  rRNA 
as  does  RTA.  A  very  similar  hybrid  protein  has  been  experi¬ 


mentally  tested  and  showed  no  significant  differences  in  ribo¬ 
some  specificity  and  catalytic  activity  from  that  observed  for 
RTA.^®  Although  there  is  a  predicted  small  decrease  in  the  Tf 
of  the  hybrid  protein  compared  to  wt-RTA,  our  simulations 
show  that  the  two  models  behaved  very  similarly  and  reflects 
the  sequence  fitness  in  their  heating  profiles  rather  than  arbi¬ 
trary  chain  excursions.  This  similarity  would  indicate  that 
the  sequence  divergence  between  RTA  and  PAP  in  the  C-ter- 
minal  region  does  not  markedly  alter  the  mechanism  of  ther¬ 
mal  unfolding. 

As  a  final  demonstration  that  the  lattice  simulation  model 
can  yield  insight  into  unfolding  of  RTA,  the  final  model  is 
construction  of  an  artificial  protein  by  replacing  the  C-termi- 
nal  region  with  all  isoleucine  residues.  The  temperature  pro¬ 
file  of  the  heat  capacity  for  the  RTA-Ile  model  displays  com¬ 
parable  heating  up  to  the  transition  point  of  the  immunogen 
(truncated  RTAl-33/44-198)  and  then  becomes  flat  until 
reaching  the  upper  bound  temperature  where  significant 
unfolding  occurs.  By  comparing  the  four  calculated  heat 
capacities,  the  simulations  suggest  a  theoretical  model  where 
RTA  unfolds  initially  through  a  “molten-globule”  state  con¬ 
sisting  of  a  partially  unfolded  N-terminal  domain  and  the  C- 
terminal  domain  is  unfolded.  Because  of  the  hydropho¬ 
bic  collapse  of  the  C-terminal  region  against  the  N-terminal 
domain  of  RTA-Ile,  the  protein  fails  to  dislodge  the  G-termi¬ 
nal  as  an  unfolded  state  until  a  high  temperature  is  reached, 
thus  the  initial  heat  capacity  appears  similar  to  the  truncated 
RTA.  The  rate-limiting  step  to  unfolding  is  the  N-terminal 
domain  and  the  early  onset  of  the  unfolded  C-terminal 
domain  lowers  the  Tf  of  the  wt-RTA  compared  to  the  trun¬ 
cated  molecule.  This  qualitative  model  of  unfolding  RTA 
is  consistent  with  reported  experimental  denaturation 
studies. 

Figure  2  illustrates  two-dimensional  (2D)  contour  maps 
of  U  versus  Rg  near  the  transition  temperature  for  the  lattice 
simulations  and  the  reconstruction  to  all- atom  models  for 
the  wt-RTA  and  RTAl-33/44-198  structures.  The  scale  of 
probability  density  distributions  is  tabulated  as  —log  Pp{  U, 
Rg),  where  Pp  is  given  by  Eq.  (8)  computed  near  or  at  the  Tf. 
We  define  U for  the  lattice  simulation  by  Eq.  (1),  and  for  all¬ 
atom  structures,  U  is  the  CHARMM19/GB  total  energy.  For 
application  of  lattice  energies,  the  contour  maps  at  the  Tf 
show  no  apparent  separation  among  the  structures  into  well- 
defined  conformational  basins  that  contribute  to  thermal 
unfolding  of  either  protein  molecule.  While  these  results 
could  have  been  anticipated  from  the  broad  heat  capacities, 
the  SICHO  model  is  suitably  parameterized  to  correctly  pro¬ 
duce  a  funnel-like  shape  in  the  conformational  energy  as  the 
protein  moves  from  low  Rg  values  of  folded  structures  to 
larger  values  representing  the  unfolded  state.  Nonetheless, 
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FIGURE  2  Two-dimensional  probability  density  distributions  as  function  of  the  potential  energy 
U  and  the  radius  of  gyration  R^.  Profiles  were  computed  at  the  observed  Tf  values  from  Figure  1 
and  their  populations  are  clustered  as  — logPyj(  U,  Rg),  where  Pp  is  the  probability  density  evaluated 
at  the  inverse  temperature  p.  Shown  is  a  comparison  of  lattice  energies  versus  CHARMM19/GB 
energies  for  RTAl-33/44-198  (top  two  panels)  and  the  native  wt-RTA  fold  (bottom  two  panels). 
Superimposed  structures  on  the  plots  include  the  hest  scoring  lattice-energy  conformations,  confor¬ 
mations  that  show  the  lowest  root-mean-square  deviation  from  the  starting  structures  (compare 
with  Figure  1)  and  the  unfolded  conformations.  A  native-like  conformation  for  wt-RTA  displays  a 
lattice  potential  energy  of  —1545  kcal/mol  and  for  RTAl-33/44-198,  a  potential  energy  of  —1164 
kcal/mol. 
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because  of  the  lack  of  a  rigorous  treatment  of  protein  interac¬ 
tions  and  solvent  effects  in  Eq.  (1),  the  lowest-energy  states 
are  more  compacted  than  native-like  conformations.  This 
outcome  is  a  characteristic  observation  of  statistical  poten¬ 


tials  applied  to  structural  refinement  of  de  novo  generated 
protein  models  using  conformational  energy  sampling 
techniques. Typical  applications  of  the  SICHO  model  is 
to  include  secondary  structure  as  a  bias  for  some 
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interactions,  ’  ’  while  our  simulations  lack  any  additional 
restraints  entered  into  the  force-field.  Introduction  of  sec¬ 
ondary  structure  biases  would  have  decreased  the  compact¬ 
ing  of  structures,  although  an  incorrect  strength  of  the  force 
constants  and  their  lack  of  temperature  dependence  can  dras¬ 
tically  affect  the  modeling  of  Tf. 

The  lattice  simulations  with  the  defined  grid  size  provided 
sufficient  sampling  of  conformational  space,  yielding  a 
wide  range  of  global  root-mean-square  deviations  (RMSD) 
of  the  backbone  coordinates  from  the  starting  structures.  For 
the  unfolded  conformations,  the  generated  ensembles  com¬ 
prise  RMSD  values  of  approximately  23-26  A.  The  lowest 
RMSD  from  the  starting  model  (comparison  of  structures 
shown  in  Figures  1  and  2)  for  RTAl-33/44-198  is  5.33  A  and 
for  the  native  RTA  structure,  the  value  is  5.05  A.  Although 
there  is  some  loss  of  /1-strands  for  both  near-native  RMSD 
structures,  the  lattice  replica-exchange  simulations  exhibited 
conformational  states  that  maintained  the  starting  folds  fairly 
well  and  preserved  the  structural  integrity  of  the  human  B- 
cell  epitopes.^^  Flowever,  the  population  density  of  native¬ 
like  tertiary  structures  is  low  and  consequently  their  energies 
make  only  marginal  contributions  to  the  broad  transition. 
The  transition  is  dominated  by  the  coexistence  of  multiple 
states  with  weak  structural  order  and  is  located  in  the  highly- 
dense  distribution  of  mid-range  energies  (dark-blue  colored 
clusters  in  Figure  2). 

In  contrast  with  lattice  energies,  the  scoring  of  recon¬ 
structed  aU-atom  structures  with  CHARMM19/GB  yields 
bifurcation  into  clusters  and  shows  an  unmistakable  free- 
energy  barrier  to  unfolding  at  the  Tf.  Because  of  the  structure 
regularization  of  each  lattice  chain,  there  appears  to  be  a  less 
pronounced  funnel-like  organization  among  the  most  popu¬ 
lated  conformations  (see  Figure  2).  The  general  failure  to  dis¬ 
criminate  structures  of  similar  energies  with  different  Rg  val¬ 
ues  illustrates  the  challenge  of  applying  all-atom  forcefields 
to  low-resolution  structures  and  is  the  outcome  of  a  mis¬ 
match  of  forcing  atomic  resolution  on  structures  containing 
steric  clashes,  unfavorable  packing,  irregular  chain  configura¬ 
tions,  etc.  Other  than  the  greater  disorder  of  the  wt-RTA 
clusters  and  larger  Rg  values,  it  is  difficult  to  extract  directly 
from  PpiU,  Rg)  the  effect  of  the  C-terminal  region  without 
having  highly-refined  conformational  states  and  their  struc¬ 
tures  for  input  into  a  principal  component  analysis.  Never¬ 
theless,  the  CHARMM19/GB  model  detects  a  transition  in 
short-  and  long-range  order  plus  interactions  of  the  contin¬ 
uum  solvent  environment  of  the  protein  conformations  that 
populate  the  highly-dense  clusters  at  the  Tf. 

There  are  several  alternative  approaches  that  could  con¬ 
ceivably  yield  improvement  in  the  resolution  of  the  folding¬ 


unfolding  free-energy  profiles.  One  possible  development  is 
to  replace  the  simple  pairwise  GB  solvent  model  and  the 
united-atom  G1TARMM19  forcefield  with  a  modified  analyt¬ 
ical  GB  molecular-volume  model^'^  combined  with  the  all¬ 
atom  CHARMM22  forcefield.  The  latter  method  is  known  to 
reproduce  more  accurately  Poisson  solvation  free  energies  of 
single  protein  chain  conformations.^^  A  more  rigorous 
approach  is  a  hybrid  explicit/implicit  solvent  model  to  treat 
at  the  aU-atom  level  the  local  hydrogen  bond  interactions 
between  protein  surface  atoms  and  water  that  are  missing  in 
GB  models,  while  modeling  the  bulk  hydration  by  an  analyti¬ 
cal  Poisson-based  reaction  field.^*’  This  approach,  although 
more  computationally  intensive,  eliminates  the  mean-field 
modeling  of  the  protein  dielectric  solvent  boundary  based  on 
continuum  electrostatics  and  applies  a  thermodynamic  inte¬ 
gration  method  for  determining  the  solvation  free  energy  of 
each  protein  chain.^^  It  is  also  possible  to  gain  improvement 
in  the  resolution  of  the  generated  protein  chains  from  recent 
reported  developments  in  lattice-based  reduced  models. 

In  either  case,  the  SICFIO  potential  energy  function  and 
other  reduced  model  functions  are  generic  in  their  represen¬ 
tations  and  are  similar  to  all-atom  functions  of  modeling 
protein  interactions.  Generalization  of  the  computational 
methodology  to  other  model  systems  is  straightforward; 
however,  the  applicability  is  restricted  to  modeling  small  to 
medium  sized  proteins  and  the  effects  of  large-scale  modifi¬ 
cations,  rather  than  the  more  detailed  problem  of  detecting 
free-energy  changes  from  single  amino  acid  substitutions. 
Despite  the  limitations  of  our  reduced  model,  the  end  result 
from  the  calculations  is  an  apparent  two-state  model  of 
unfolding-folding  with  the  protein  design  exhibiting  a  larger 
free-energy  barrier  to  thermal  denaturation  than  the  native 
structure. 

CONCLUSIONS 

In  this  study,  we  determined  that  a  multiresolution  model¬ 
ing  approach  of  pairing  a  coarse-grained  lattice  simulation 
model  and  all-atom  reconstruction  of  structures  from  lat¬ 
tice  chains  is  capable  of  modeling  a  realistic  protein  design 
problem.  Given  the  size  of  the  modeled  proteins  and  their 
structural  complexity,  the  potential  energy  functions  are  of 
sufficient  resolution  to  resolve  favorable  changes  in  the 
sequence-structure  compatibility  of  the  RTA  fold  which 
leads  to  an  improved  thermodynamic  stability.  Within  the 
framework  of  coarse-grained  simulations,  the  methodology 
of  multiscale  modeling  should  be  suitable  to  other  exam¬ 
ples  where  computing  relative  changes  in  the  protein  heat 
capacity  is  simply  outside  the  practicality  of  applying  all¬ 
atom  simulations. 
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